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Abstract 

In this paper a new dissimilarity measure to identify groups of 
assets dynamics is proposed. The underlying generating process is 
assumed to be a diffusion process solution of stochastic differential 
equations and observed at discrete time. The mesh of observations 
is not required to shrink to zero. As distance between two observed 
paths, the quadratic distance of the corresponding estimated Markov 
operators is considered. Analysis of both synthetic data and real fi- 
nancial data from NYSE/NASDAQ stocks, give evidence that this 
distance seems capable to catch differences in both the drift and dif- 
fusion coefficients contrary to other commonly used metrics. 

keywords Clustering of time series; discretely observed diffusion pro- 
cesses; financial assets, markov processes. 

1 Introduction 

In recent years, there has been a lot of interest in mining time series data. In 
particular, financial data are among the most studied data. Although many 
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measures of dissimilarity are available in the literature (see e.g. Liao, 2005, 
for a review) most of them ignore the underlying structure of the stochastic 
model which drives the data. Among the few measures which consider the 
properties of the models we can mention Hirukawa (2006) which considers 
non-gaussian locally stationary sequences, Corduas and Piccolo (2008) who 
proposed an autoregressive metric as a distance between ARIMA models, 
and several information measures based on the the estimated densities of the 
processes (see e.g. Kakizawa et. al, 1998). 

Needless to say, starting from the Black and Scholes (1973) and Merton 
(1973) model, most of models of modern finance rely on continuous time pro- 
cesses. In particular, in most of the cases the dynamic of underlying process 
used in option pricing is assumed to be a diffusion process solution to some 
stochastic differential equations. This paper proposes a dissimilarity mea- 
sure which is particularly taylored to discretely observed diffusion processes. 
This measure is based on a new application of the results by Hansen et. al 
(1998) on identification of diffusion processed observed at discrete time when 
the time mesh A between observations is not necessarily shrinking to zero. 
The theory proposed in Hansen et al. (1998) has been used in Kessler and 
S0rensen (1999) and Gobet et al. (2004) in parametric and non parametric 
estimation of diffusion processes respectively. The theory is based on the fact 
that, when the process is not observed at high frequency, i.e. A — > 0, the ob- 
served data become a true Markov process for which it is possible to identify 
the Markov operator Pa- The continuous time model is instead characterized 
by the inifinitesimal generator Lh^a, where b and a are, respectively, the drift 
and diffusion coefficients of the process. These two operators are equivalent, 
in the sense of functional analysis, so if one can estimate the Markov opera- 
tor from the data it is also possible to identify the process and in particular 
the couple {b,a). The identification step of this procedure, needs some care 
(see e.g. Gobet et. al, 2004). In the present paper, we instead rely on the 
Markov operator only and use it to build a measure of dissimilarity between 
two observed processes. Some form of ergodicity or stationarity of the under- 
lying process is usually required although these hypothesis can be relaxed in 
several directions as, for example, mentioned in Kessler and S0rensen (1999). 

The paper is organized as follows. Section [2] introduces the model and 
the assumptions. The Markov operator is presented in Section |3] Section |4] 
studies the performance of the method. First, the behaviour of the operator 
is analyzed on simulated paths when data belong to the same hypothetical 
groups. Finally, real data from the NYSE/NASDAQ are analyzed. All the 
results include a comparison with other three dissimilarity measures, namely, 
the Euclidean distance, the short-time series distance and the dynamic time 
warping distance. All plots and figures are contained after the references in 
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Section [S] 



2 Model and assumptions 

Let / = (/, r), — oo ^ / < r ^ +00 be the state space of a time-homogeneous 
diffusion process {Xt,t ^ 0} solution of a stochastic differential of the form 

dXt = b{Xt)dt + a{Xt)dWt (1) 

In the expression Q, 6 : / ^ M and a : I ^ (0, 00) represent drift and 
diffusion coefficient, while Wt is a standard brownian motion. 

Assumption 1 The drift and diffusion coefficient are such that the stochas- 
tic differential equation ([T| admits a unique weak solution Xt. 

Let us introduce the scale function and speed measure, defined respectively 
as 



with X any value in the state space {l,r), and 

m{x) = \ , . ■ (3) 
Assumption 2 We assume that 

/r 
m{x)dx < 00 . 

Let, X* be an arbitrary point in the state space of Xt such that 

/X* 
s{x)dx = —00 . 

// one or both of the above integrals are finite, the corresponding boundary is 
assumed to be instantaneously reflecting. 



If the Assumption [T]j2] are satisfied, then exists a unique ergodic process 
Xt solution for the stochastic differential equation ([T]), with invariant law 

m{x) e^p{2/;j|))dl/} 
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3 The Markov operator 



Consider now the regularly sampled data Xi = X{iA), i = 0, . . . ,N, from 
the sample path of Xt, where A > and not shrinking to and such that 
T = A^A. The process {Xj}j=o,...,Af is a Markov process and under mild 
regularity conditions, all the mathematical properties are embodied in the 
transition operator 

Pa/(x) = E{/(X,)|X,_i = x}. 

Notice that Pa depends on the transition density between Xi and Xj_i, 
so we put explicitly the dependence on A in the notation. This operator 
is associated with the infinitesimal generator of the diffusion Lh ^^ which is 
the following operator on the space of continuous and twice different iable 
functions 

L,,J{x) = ^nx) + b{x)f{x). 

When the invariant density /i = fJ,b,a{-) of the process Xt exists, the operator 
is unbounded but self-adjoint negative on L'^{fi) = {f '■ f |/pd/i < oo} 
and the functional calculus gives the correspondence (in terms of operator 
notation) 

Pa = exp{AL4 (5) 

This relation has been first noticed by Hansen et al. (1998) and Chen et al. 
(1997). It was then used in statistics to derive estimating functions based on 
the eigenvalues of the above problem by Kessler and S0rensen (1999). Indeed, 
to estimate parametrically the coefficients a{x) = ag{x) and b = bg{x) of ([T]) 
it suffices to notice that 

Lefix) = ^f"{x) + be{x)f'ix) 

can be seen as an eigenvalue problem LoipQ^x) = Kgipg{x) and the pair {ng, ipe) 
satisfies 

PAMX^) = E{V'e(X,+i)|X,} = eM^eA)MX^) • 

When the solution is available it is then possible to impose a set of moment 
condition from which estimating functions are obtained. More recently, under 
low sampling rate, the result ^ was used to estimate non parametrically the 
drift and diffusion coefficient by Gobet et al. (2004). Indeed, consider the 
explicit form of the invariant law /ib^o- in ^ and define S{x) = l/s'{x) = 
^a^{x)/ fj,h^o-{x) (see also Ait-Sahaha, 1996). Being z/i the largest negative 
eigenvalue of L^j^^^, the following eigenvalue problem can be written 

Lb,aUi{x) = — ('S'(a;)ni(x))' = uiui{x) 
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from which S{x)u[{x) = ui Jj^ ui{y)fib,„{y)dy. Finally, 



.2r^^ _ '^^i ji ui{y)iibAy)^y 



and 

Ui{x)u[{x)f^bA^) -u'l{x) Jl'ui{y)fib,^{y)dy 

When Pa can be estimated properly from the data, the pair {ui, vi) can be 
obtained as well and then plugging these values into the above expressions 
([7]) and ^ estimators of and cr(-) are obtained. 

In this paper, we propose to use an the estimator of Pa and from this 
build a distance between discretely observed diffusion processes. 

For a given L^-orthonormal basis j G J} of L^([/,r]), where J is an 
index set, following Gobet et. al (2004) it is possible to obtain an estimator 
Pa of < PA(l>jy4>k >At6^ with entries 

1 ^ 

(Pa),>(x) = — 5^{0,(x,_i)0fc(x,) + 0,(x,_i)0,(x,)}, j,keJ (8) 

i=l 

The terms (Pa)j,a; are approximations of < PA(pj,<Pk >Atb„, that is, the action 
of the transition operator on the state space with respect of the unknown 
scalar product < ■, ■ ^ and hence can be used as "proxy" of the probability 
structure of the model. 

We remark that, like the invariant density fib,a, the Markov operator it- 
self cannot perfectly identify the underlying process, in the sense that, for 
some (&i,cri) there might exist another couple (62,(72) such that yUb^^o-^ (a;) = 
/ib2,o-2(x) and the same applies to the infinitesimal generator and hence to 
the Markov operator. So in this sense, the identification cannot be precise: 
unicity is not guaranteed. Nevertheless, the measure proposed in the next 
section, can only help in finding similarities of two (or more) processes in 
terms of the action of the Markov operator on the approximating space gen- 
erated by the basis of above. Indeed, the Markov operator also takes 
into account the transition properties of the observed sequence which is the 
natural way to make inference from discretely observed diffusion processes. 



4 Analysis of performance of the proposed 
method 

In this section we consider four different distances to be used in both the 
analysis of synthetic data and on real financial time series. For an updated 
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review on time series dissimilarity measures see Liao (2005). In the following, 
we will denote by X = {Xi, i = 1, . . . , N} snad Y = {Yi,i = 1, . . . , N} two 
generic paths. We will consider the following measures 

The Markov-Operator distance Following the suggestion in Reifi(2003) 
we use a basis of 20 orthonormal B-splines on a compact support (see Ramsay 
and Silverman, 2005) of degree 10. As compact support we consider the 
observed support of all diffusion paths enlarged by 10%. In the analysis of 
synthetic data, the support is just the interval [0,1]. Then we define the 
Markov Operator distance as follows 



Short-Time-Series distance Proposed by MoUer-Levet et al. (2001) is 
based on the idea to consider each time series as a piecewise linear function 
and compare the slopes between all the interpolants. It reads as 



This measure is essentially design to discover similarities in the volatility 
between two time series regarding of the average level of the process, i.e. one 
process and a shifted version of it will have zero distance. 

The Euclidean distance The usual Euclidean distance is one of the most 
used in the applied literature, in particular in one step ahead prediction. We 
will calculate it as follows 



and use only for comparison purposes. 

Dynamic Time Warping distance The Euclidean distance is very sen- 
sitive to distortion in time axis and may lead to poor results for sequences 
which are similar, but locally out of phase (Corduas, 2007). The Dynamic 



dMo{x,Y) = J2 [(^A),,fc(x) - {pA)jAy)? 



where {PA)j,k{X) is calculated as in ([s]). 




N 
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Time Warping (DTW) was introduced originally in speech recognition anal- 
ysis (Sakoe and Ciba, 1978; Wang and Gasser, 1997). DTW allows for non- 
linear alignments between time series not necessarily of the same length. 
Essentially, all shiftings between two time series are attempted and each 
time a cost function is applied (e.g. a weighted Euclidean distance between 
the shifted series). The minimum of the cost function over all possible shift- 
ing is the dynamic warping distance d^Tw- In our applications we use the 
Euclidean distance in the cost function and the algorithm as implemented in 
the R package dtw (Giorgino, 2007). 

4.1 Analysis of synthetic data 

We simulate 10 paths Xj, i = 1, . . . , 10, according to the combinations of 
drift bi and diffusion coefficients (Tj, i = 1, ... ,4 presented in the following 
table 





o-i(x) 


0-2(2;) 


0-3 (x) a^{x) 


h 




XIO, XI 




X5 








X2,X3 


X4 


h 






X6, X7 




hi 








X8 



where 

bi{x) = l-2x, b2ix) = 1.5(0.9-x), b^^^x) = 1.5(0.5-x), b^ix) = 5(0.05-x) 
and 

<Ji{x) = 0.5 + 2x(l — x), <72{x) = a/0.55x(1 — x) 

asix) = V0.1x(l - x), a^ix) = ^0.8x{l - x) 

The process X9 = 1 — XI, hence it has drift — 6i(x) and the same quadratic 
variation of XI and XIO. 

We simulate each path using (second) Milstein scheme (see e.g. Kloden et 
al. 1999 or lacus, 2008) with time lag 5 = le — A each path of length 50,000. 
Observations have been then resampled at rate A = 0.1, so the observed path 
used in the estimation have length N = 500. The sample path of process X9 
is a reflection of the sample path of XI around 1, i.e. X9 = 1 - XI. The final 
paths are reported in Figure [T] 

After applying the distance dMo, dsrs, dsuc and d^Tw we run hierarchi- 
cal clustering with complete linkage method. To make the output graphically 
comparable we rescale all distance matrixes to (0,1). This rescaling only gives 
the feeling of relative distance of observations, so numerical values are not 
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really comparable from one distance to another. Figure [2] shows the final 
classification. From the plots it appears quite evident, that apart from scal- 
ing, that dsTS and dEuc agrees but unfortunately in this example, do not 
correctly classify the paths. The processes XI and XIO are driven by the 
same stochastic differential equation although their initial values are differ- 
ent, while this difference increases the distances dsrs and dEuc the Markov 
operator distance dMo seems to correctly catch the similar effect of drift and 
diffusion coefficient on the path. Similarly for X9 which is just a reflection 
of XI around 1. In Figure [2] it easy to see that while dMo Puts XI and XIO 
together and this cluster together with X9, the other two distance put X9 
in a separate cluster which is then aggregated with trajectories in different 
clusters not really related to X9 in terms of drift and diffusion coefficients. 
Processes X2 and X3 are also driven by the same stochastic differential equa- 
tion and this is captured also by the other two distances although dpio Put 
X2 and X3 in the smaller cluster and then aggregate with X4 which has the 
same drift as X2 and X3 but a different diffusion coefficient. The other two 
methods put X3 and X4 together and then aggregate X2. A similar situation 
occurs for X6 and X7 which have the same stochastic differential equations 
which clearly separated by dMo and not for the other two distances. Fi- 
nally, dMo clearly separates X8 which is the real outlier in terms of drift and 
diffusion coefficient. This fact is not captured by the other measures. 

4.2 Analysis of real financial data 

We consider the time series of daily closing quotes, from 2006-01-03 to 2007- 
12-31, for the following 20 financial assets: Microsoft Corporation (MSOFT 
in the plots). Advanced Micro Devices Inc. (AMD), Dell Inc. (DELL), In- 
tel Corporation (INTEL), Hewlett-Packard Co. (HP), Sony Corp. (SONY), 
Motorola Inc. (MOTO), Nokia Corp. (NOKIA), Electronic Arts Inc. (EA), 
LG Display Co., Ltd. (LG), Borland Software Corp. (BORL), Koninklijke 
Philips Electronics NV (PHILIPS), Symantec Corporation (SYMATEC), JP- 
Morgan Chase & Co (JMP), Merrill Lynch & Co., Inc. (MLINCH), Deutsche 
Bank AG (DB), Citigroup Inc. (CITI), Bank of America Corporation (BAG), 
Goldman Sachs Group Inc. (GSACHS) and Exxon Mobil Corp. (EXXON). 
Quotes come from NYSE/NASDAQ. Source Yahoo.com. Missing values have 
been linearly interpolated. These assets come from both electronic hardware, 
appliance and software vendors or producers, financial institutions of differ- 
ent type and a petrol company. Figure |4] represents the 20 paths of the 
assets all on the same scale in order to make them comparable in a visual 
inspection. It is clear that some titles have larger volatility than others and 
possibly there some outlier in terms of both trend and volatility. For exam- 
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pie, looking at financial companies, one can notice that MLINCH, DB amd 
GSACHS, although at different volatility levels, all present the same (cyclic) 
drift behaviour over time. Further, CIT and BAG seems quite close in terms 
of volatility and drift. But visual inspection alone is not sufficient so try to 
discover clusters using the for distances introduced before. Figure |5] reports 
the four different dendrograms for the four metrics. While all methods seems 
to separate DB and GSACHS, only (Imo seems to collect most of financial 
companies in the same parent cluster. Our metrics clearly separates BORL 
(as an outlier or singleton) in a cluster very far form the other observed 
paths. Also (Itw and (Ieuc tend to separate BORL as well, but the identified 
cluster is closer to other observations than other clusters. The metric dsrs 
does not appear to give sharp indication on how to separate clusters. Indeed, 
if we decide to split the dendrogram into four clusters, duo separates BORL 
in one cluster, DB, GSACHS, MLINCH and EXXON in the secod cluster, a 
group of hardware producer (mostly) and a final group of less active financial 
assets (CITI, JPM) and appliance producers or hardware assemblers (SONY, 
PHILIPS, HP). EA goes together with SONY in all dendrograms, which is 
not an unrealistic evidence in that the company essentially produces software 
for game consoles. Figure [6] present the multidimensional scaling of the dis- 
tance matrix in which groups identified by the cluster are plotted with the 
same symbol and, when cluster contain more than two elements, the ellipsoid 
hull is also drawn. 

The present very superficial analysis of the clustering should not go in 
depth with financial implications of the results. Nevertheless, the conclusion 
of the analysis is that, although all metrics have pros and cons because 
they look at single different aspects, the Markov operator distance seems 
able to discriminate discrepancies in both volatility and drift of the observed 
processes. It also give a sharp indication on where to cut the dendrogram to 
obtain comparable groups. 
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5 Figures 
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Markov Operator Distance 
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Figure 2: Clustering according to different distances. 
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Multidimensional scaling 
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Figure 3: Multidimensional scaling representation of distance (Imo with 
points identified after cutting dendrogram 1 in Figure [2] into 4 groups. 
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Financial Time Series 




Time Time 

Figure 4: Paths of the 20 assets considered: from 2006-01-03 to 2007-12-31. 
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Markov Operator Distance 
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Figure 5: Clustering according to different distances. 
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Multidimensional scaling (IVIO) 
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Figure 6: Multidimensional scaling on the distance matrix duo- Observa- 
tions in the same cluster are plotted with the same symbol. If a cluster 
contains more than two observation, the ellipsoid hull is also represented. 
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